Incremental statestream checkpoints for replays#18213
Merged
LibretroAdmin merged 20 commits intolibretro:masterfrom Aug 30, 2025
Merged
Incremental statestream checkpoints for replays#18213LibretroAdmin merged 20 commits intolibretro:masterfrom
LibretroAdmin merged 20 commits intolibretro:masterfrom
Conversation
Now, a series of dozens or hundreds of states can take up less space
than a single uncompressed state. This makes it feasible to take one
checkpoint per minute or even one per second, mitigating determinacy
issues in many cores.
Cores with higher memory usage may take more than one frame to
serialize or unserialize, which can lead to hitches during recording
and playback. Hitches or desyncs may also occur if inputs are cleared
on deserialize. This may require optimizing the encoding routine,
per-core optimizations to serialization, or tracking dirty regions of
memory and adding new API to serialize/deserialize just the changed
memory areas.
- Move input_driver.c bsvmovie code into new bsvmovie.c
- New module: hashmap-based uint32s index for block-based deduplication
- Bump replay format version to include a bigger header (9 ints now),
and to make the initial savestate a "CHECKPOINT2"-type checkpoint
(which holds compression and encoding info).
- The new header fields are frame count, superblock size, and block
size (the latter two are parameters on savestreams, per
https://github.com/sumitshetye2/v86_savestreams/ )
This technique is broadly complementary to delta-encoding, and in the
future I think they could be combined (delta encoding for small
changes, deduplicated block encoding when deltas become large or a lot
of time has passed). The current checkpoint2 format can support
checkpoints using different encoding schemes and compression codecs,
so this shouldn't pose any difficulty down the road.
This also gives us a way (eventually) to rewind replays arbitrarily far
back, without relying on the rewind buffer. That will be some later
changes to state_manager.c.
- lower zstd compression level (3) - higher hashmap capacity in savestream index - bigger block size for savestreams - use xxhash for savestream index - track counts and print histogram (in debug mode) to understand index usage
…ol which is also set to false after rewinds or state loads, for use during both recording and replay.
Also tune settings a bit more for large memories Set checkpoint compression based on savestate compression flag
Contributor
Author
|
Here are the results of my benchmarking; as I see it, there is very little time overhead for incremental statestream encoding. These are recorded on a plugged-in laptop with an AMD Ryzen 7 6800U inside. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With this patch, a series of dozens or hundreds of states can take up less space
than a single uncompressed state. This makes it feasible to take one
checkpoint per minute or even one per second, mitigating determinacy
issues in many cores.
Cores with higher memory usage may take more than one frame to
serialize or unserialize, which can lead to hitches during recording
and playback. This may require optimizing the encoding routine,
per-core optimizations to serialization, or tracking dirty regions of
memory and adding new API to serialize/deserialize just the changed
memory areas. As a reference, ppsspp saves can be encoded or decoded in roughly 10ms.
Desyncs may occur on deserialize (some cores, e.g. fceumm, exhibit this behavior; my guess is that these cores clear inputs on deserialize or something) .
and making the initial savestate a "CHECKPOINT2"-type checkpoint
(which holds compression and encoding info).
size (the latter two are parameters on savestreams, per
https://github.com/sumitshetye2/v86_savestreams/ )
This technique is broadly complementary to delta-encoding, and in the
future I think they could be combined (delta encoding for small
changes, deduplicated block encoding when deltas become large or a lot
of time has passed). The current checkpoint2 format can support
checkpoints using different encoding schemes and compression codecs,
so this shouldn't pose any difficulty down the road.
This also gives us a way (eventually) to rewind replays arbitrarily far
back (e.g., checkpoint-by-checkpoint), without relying on the rewind buffer. That will be in a later PR.
One config flag is added:
replay_checkpoint_deserialize, enabled by default (the current state), which can be disabled for cores like fceumm which do not like to have states loaded during replay playback.